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Abstract 

MoIIusca is the second most diverse group of animals in the world. Despite their perceived importance, 
omics-Ievel studies have seldom been applied to this group of animals largely due to a paucity of genomic 
resources. Here, we report the first large-scale gene-associated marker development and evaluation for a 
bivalve mollusc, Chlamys farreri. More than 21,000 putative single-nucleotide polymorphisms (SNPs) 
were identified from the C. farreri transcriptome. Primers and probes were designed and synthesized for 
4500 SNPs, and 1492 polymorphic markers were successfully developed using a high-resolution melting 
genotyping platform. These markers are particularly suitable for population genomic analysis due to high 
polymorphism within and across populations, a low frequency of null alleles, and conformation to neutral 
expectations. Unexpectedly, high cross-species transferability was observed, suggesting that the transferable 
SNPs may largely represent ancestral genetic variations that have been preserved differentially among sub- 
families of Pectinidae. Gene annotations were available for 73% of the markers, and 65% could be anchored 
to the recently released Pacific oyster genome. Large-scale association analysis revealed key candidate genes 
responsible for scallop growth regulation, and provided markers for further genetic improvement of C. farreri 
in breeding programmes. 

Keywords: mollusca; single-nucleotide polymorphism (SNP); transcriptome; high resolution melting (HRM); 
genome-wide association (GWAS) 



1 . Introduction 

Mollusca isthe second mostspeciose animal phylum, 
containingmorethan 1 00 OOOextantspeciesdistribu- 
ted across eight major lineages. 1 Molluscs play vital 
roles in the structure and functioning of aquatic and ter- 
restrial ecosystems. Many molluscs are important fishery 
and aquaculture species, and some also serve as models 
for studying neurobiology, biomineralization,and adaptive 
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evolution in response to climate change. 2,3 Despite 
their perceived importance, systematic studies of mol- 
luscan biology and evolution remain very limited 
largely due to a paucity of genomic resources. 

Recent advances in next-generation sequencing 
(NGS) technologies (e.g. Roche's 454 and lllumina's 
Solexa) now allow for rapid generation of extensive 
genomic resources at affordable cost for any organism, 
thus opening up opportunities for conducting omics- 
Ievel analyses in molluscs. NGS-based genome sequen- 
cing has recently been performed for two bivalve 
molluscs, i.e. Pacific oyster (Crassostrea gigas) 4 and 
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pearl oyster (Pinctada fucata), 5 providing the first 
insights into molluscan genome architecture and the 
genetic basis of stress adaptation, shell formation, and 
pearl biosynthesis. However, whole-genome de novo 
sequencing is currently still costly even using NGS 
platforms and remains out of reach for most laboratories 
focusing on non-model organisms. As an attractive 
alternative, transcriptome sequencing represents a cost- 
effective approach to rapidly expand gene resources, 
and it has been widely applied in many non-model 
organisms. 6 The extensive gene resources generated by 
transcriptome sequencing are very useful not only for 
transcriptome-wide gene characterization and expres- 
sion profiling, but also for large-scale single-nucleotide 
polymorphism (SNP) discovery. Such 'functional' SNPs 
are particularly valuable for quantitative genetic and 
evolutionary studies, because they have a great potential 
for quickly identifying causal genes responsible for 
either complex traits or adaptive evolution. 

Genome-wide scans for detecting quantitative trait 
loci (QTL) for traits of interest or genetic determinants 
for local adaptation usually require a large number of 
genetic markers. SNPs have attracted significant atten- 
tion, as they represent the most abundant class of 
genetic variations in eukaryotic genomes and have 
already become the marker of choice in large-scale 
genotyping applications, such as high-resolution linkage 
and association mapping, genomic selection, and com- 
parative genome analysis. However, SNP markers have 
been insufficiently developed for molluscs in compari- 
son with well-studied model organisms, such as mouse, 
nematode worm, and zebrafish. Even for the molluscs 
that are well characterized at the molecular level 
(e.g. Pacific oyster), only hundreds of markers are avail- 
able. Transcriptome sequencing has recently been 
conducted in many molluscs, 7-11 providing extensive 
resources for large-scale gene-associated SNP mining. 
High-throughput SNP screening and marker develop- 
ment rely on an efficient genotyping platform. High 
resolution melting (HRM) has proved to be an extremely 
powerful technique for rapid profiling of genetic vari- 
ation within PCR amplicons. 1 2,1 3 HRM has several 
advantages over other genotyping methods, such as its 
simplicity, low cost, high sensitivity, and specificity, and 
has been widely applied to many non-model species 
for marker development at medium-to-large scale. 14,1 5 

Our group has recently released a large amount of 
transcriptome data via 454 sequencing for a bivalve 
mollusc, Chlamys farreri (Jones et Preston 1904). 16 
Chlamys farreri is naturally distributed along the sea- 
coasts of China, Japan, and Korea and is also an import- 
ant aquaculture species in China. To date, <1 00 SNP 
markers have been developed for this species, 1 7,1 8 
which is not sufficient for large-scale genomic analyses, 
such as high-resolution linkage and QTL mapping, asso- 
ciation mapping, and comparative genome analysis. 
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Here, we conducted the first large-scale gene-associated 
SNP marker development for C. farreri. The 1 492 SNP 
markers developed in this study were further evaluated 
fortheir usefulness in population genomic, comparative 
genomic, and association analyses. 



2. Materials and Methods 

2.1 . Transcriptome sequences, assembly, and SNP 
mining 

The C. farreri transcriptome sequences (SRA ac- 
cession no. SRA030509) were produced by 454 se- 
quencing of cDNA libraries prepared from diverse 
developmental stages and adult tissues. The details of 
sample collection, library preparation, and 454 sequen- 
cing have been described in a previous study. 16 Briefly, 
for larval samples, approximately 1000 parents were 
used for artificial fertilization, while adult tissues were col- 
lected from 30 individuals. To reduce the risk of identify- 
ing artificial SNPs arising from sequencing errors or 
misassembly of paralogous sequences, 454 reads were 
reassembled using the CAP3 programme 19 under very 
stringent assembly criteria (overlap setting: 1 00 bp and 
95% similarity). Using the QualitySNP programme, 20 
putative SNPs were identified in the assembled contigs 
that were covered by at least four reads and had at least 
two reads for each allele. 

2.2. SNP marker development based on the HRM 
genotyping platform 

SNP markers were developed using a cost-effective 
HRM method 14 that used two PCR primers and one un- 
labeled probe. SNP markers were named as follows: C 
followed by a contig ID, and then S followed by the 
SNP position (bp) within the contig. HRM primers and 
probes were designed following the rules described by 
Wang et al.^ 4 Primers and probes were synthesized 
(Sangon Biotech) and evaluated using six C. farreri in- 
dividuals. PCR amplifications were performed in a 
1 0-|xl volume composed of 20 ng genomic DNA, 
0.1 |xM forward primer, 0.5 |xM reverse primer, 1 .5 mM 
MgCl 2 , 0.2 mM dNTPs (Invitrogen), 1 x LCGreen Plus 
(Idaho Technology), 1 x PCR buffer, and 0.5 U Taq 
DNA polymerase (TaKaRa). Thermal cycling began 
with an initial denaturing step at 95°C for 5 min, fol- 
lowed by 60 cycles of 95°C for 40 s, 63°C for 40 s, 
and 72°C for 40 s with a final extension at 72°C for 
5 min. The corresponding probe was added to each 
PCRtoafinalconcentrationof 3 |xM,andthe reaction 
mixture was denatured at 95°C for 1 0 min and then 
slowly cooled to 4°C. HRM genotyping was performed 
on a Light-Scanner instrument (Idaho Technology) 
with continuous meltingcurve acquisition (1 0 acqui- 
sitions per °C) during a 0.1 °C/s ramp from 40 to 9 5°C. 
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All primerand probe sequences of the developed SNP 
markers are provided in Supplementary Table S1 . 

2.3. Population genetic analysis 

The developed SNP markers were evaluated using 
four wild, geographical populations. In total, 54 C. 
farreri individuals were used for this evaluation, of which 
24 were collected from the Jiaonan (JN) population, 1 2 
from the Changdao (CD) population, 1 2 from the 
Rongcheng (RC) population, and 6 from the Dalian (DL) 
population. The collection details of these samples were 
described in a previous study. 21 Marker polymorphisms 
were evaluated within and among populations. For each 
marker, allele frequency, observed heterozygosity (H Q ), 
and expected heterozygosity (H e ), along with tests for 
neutrality, Hardy-Weinberg equilibrium, and linkage dis- 
equilibrium, were calculated using POPGENE 22 or 
GENEPOP 4.0 program. 23 For Hardy-Weinberg equilib- 
rium and linkage disequilibrium tests, Bonferroni correc- 
tion was also applied to account for multiple testing. 

2.4. Test for cross-species transferability 

In total, 34 markers targeting 24 synonymousand 1 0 
non-synonymous SNPs were selected for evaluating 
cross-species transferability in five scallop species from 
three Pectinidae subfamilies, i.e. Chlamydinae (Chlamys 
nobilis and Patinopecten yessoensis), Pectininae (Amusium 
pleuronectes), and Aequipectini {Argopecten irradians 
and Argopecten purpuratus). All species except/4, purpur- 
atus were purchased from local markets in China. 
Argopecten purpuratus samples were kindly provided 
by Dr Chunde Wang (Qingdao Agriculture University, 
China). For each species, eight individuals were used 
to evaluate polymorphisms. 

2.5. Marker annotation and gene comparison 
with Pacific oyster 

To provide functional annotations for the SNP 
markers, relevant contig sequences were compared 
against the Swiss-Prot and Nr protein databases using 
BlastX, with an e-value threshold of 1e-5. Gene 
names were assigned to each marker based on the 
best hit. The BlastX results were imported into the 
Blast2GO software 24 for gene ontology (GO) analysis. 
GO terms were assigned to query sequences based on 
three ontology classifications, i.e. biological process, 
molecular function, and cellular component. To gain 
an overview of gene pathways, KEGG analysis was also 
performed using the KEGG Automatic Annotation 
Server. 25 The bi-directional best hit method was used 
to obtain KEGG orthology assignments. The contig 
sequences containing the SNP markers were also com- 
pared against the oyster protein database using BlastX, 
with an e-value threshold of 1 e— 4. 



2.6. Large-scale marker associations with 
growth traits 

Large-scale marker associations with growth traits 
were conducted on an elite variety of C. farreri named 
the 'Penglai-Red' scallop (Aquacultural Variety 
Registration Number of the Ministry of Agriculture of 
China: GS02-001 -2005), which was developed by our 
group and has been under continuous artificial 
selection for red shell colour and fast growth for mul- 
tiple generations. A large breeding population was 
established in May 2010 by artificial fertilization of 
more than 3000 sexually mature 'Penglai-Red' scallops 
at the hatchery of the Xunshan Aquatic Group 
Corporation (Shandong Province, China). After rearing 
for 2 months, half the juvenile scallops were moved 
to another hatchery that is ~2 50 km from the 
original hatchery. In May 2012, approximately 1000 
2-year-old scallops were randomly sampled from each 
hatchery, constituting two target populations for the 
association analyses in this study. To reduce the geno- 
typing cost, a selective genotyping strategy was 
adopted. For each population, 40 individuals with large 
body sizes and 40 ones with small body sizes were 
chosen for genotyping with the 1 492 SNP markers. For 
an initial scan of the whole marker set, a DNA pool was 
established for each group by mixing equal amount 
of genomic DNA prepared from individuals within 
each group. The resultant DNA pools were subject to 
HRM analysis to search for allele frequency differences 
between groups. Markers showing large between-group 
differences in allele frequencies were subject to further 
validation by genotyping all individuals in both groups; 
statistical significance was determined using a x 2 test. 



3. Results 

3.1 . Transcriptome assembly and SNP mining 

In total, 1 099 2 54 clean reads were used for tran- 
scriptome assembly. The final assembly consisted of 
50 741 contigs and 406 825 singletons. Contig sizes 
ranged from 60 to 6063 bp, with an average of 
541 bp. The average read and sequencing depth cover- 
age across all contigs were 1 3.6 and 7.5, respectively. 
The relatively low sequencing depth limits our ability 
to detect less common or rare SNPs; nevertheless, this 
transcriptomic resource enabled us to identify a large 
number of candidate SNPs for marker development. A 
total of 21 813 putative SNPs were identified from 
1 8 780 contigs. The distribution of SNP coverage is 
shown in Fig. 1. The overall SNP frequency was one 
SNP per 1 2 58 bp. Of these SNPs, 14 641 (67%) were 
transitions, whereas 71 72 (33%) were transversions. 
A total of 7338 SNPs were identified from contigs for 
which gene annotation information was available. 
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3.2. Marker development based on the HRM 
genotyping platform 
The entire marker development and evaluation pro- 
cedurearedepicted in Fig. 2. All putative SNPs were eval- 
uated for HRM primer and probe design, which led to a 
successf u I set of pri mers a nd probes for 4 5 0 0 SN Ps with 
a wide range of minorallele frequencies (MAFs;Table 1 ). 
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Figure 1. Distribution of sequencing coverage for the SNPs detected 
in the C.farreri transcriptome. 



All primers and probes were synthesized and evaluated 
using six C. farreri individuals. A total of 3042 primer 
pairs produced strong single bands with expected 
sizes. Although the expected amplicon length was 
restricted to ~100bp during primer design, 616 
primer pairs still produced bands larger than the 
expected sizes, indicating potential introns in the vicin- 
ity of those SNPs. Longerampliconscan greatlydiminish 
the sensitivity of HRM analysis, so these SNPs were 
excluded from further consideration. Two hundred 
and forty-seven primer pairs produced more than one 
band possibly due to non-specific amplifications. The 
remaining primer pairs resulted in poor amplifications, 
i.e. very weak or no PCR product, and were excluded 
from further consideration. Probe testing was subse- 
quently performed for primer pairs with successful 
amplifications. In total, 2231 probes generated well- 
recognizable melting curve profiles with a distinct 
peak for each allele. Of these, 1492 loci were poly- 
morphic across all assayed individuals, whereas 739 
loci were monomorphic possibly due to the small 
number of assayed individuals. The SNP validation rate 
was correlated with the MAF in the initial discovery 
panel. A higher MAF tended to lead to a higher valid- 
ation rate (Table 1 ). For example, at the MAF interval 
of 0.4-0.5, 75% of markers were polymorphic. In con- 
trast, only 42% we re polymorphic at the MAF interval of 
0-0.1 , and the low validation rate may be largely at- 
tribute to the relatively low quality of predicted SNPs 
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Figure 2. A schematic workflow describing SNP marker development and evaluation in C.farreri. 
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(i.e. SNPs that were difficult to distinguish from sequen- 
cingerrors). Information pertainingtoall 1 492 SNPloci 
has been submitted to the dbSNP database (http:// 
www.ncbi.nlm.nih.gov/snp/) under accession numbers 
rs831 881 546-rs831 883035 and rs831 883061 - 
rs831 883062. 

3.3. Evaluation ofSNP markers for population 
genomic analysis 
The suitability of the 1 492 polymorphic markers for 
population genomic analysis was evaluated using four 
geographical populations. PCR success rates across all 
individuals ranged from 48 to 1 00% with an average 
of 93%, suggesting that null alleles are indeed present 
at very low frequencies in these populations as expected 
for gene-derived markers. All the markers showed poly- 
morphisms across the four populations and within each 
population. Marker polymorphism was 1 00%fortheJN 



Table 1. Overview of the efficiency of HRM-based SNP marker 
development in C.farreri 



Minor 


No. of 


No. of primer 


No. of probe 


No. of 


allele 


primers 


validated (% 


validated (% 


polymorphic 


frequency 


designed 


of designed) 


of designed) 


loci (% of 


ranges 








designed) 


0.4-0.5 


1 538 


1 056 (69%) 


761 (49%) 


571 (37%) 


0.3-0.4 


1422 


954 (67%) 


701 (49%) 


506 (36%) 


0.2-0.3 


832 


559 (67%) 


422 (51%) 


266 (32%) 


0.1 -0.2 


636 


41 8 (66%) 


309 (49%) 


1 33 (21%) 


0-0.1 


72 


55 (76%) 


38 (53%) 


1 6 (22%) 



population, 96% for the CD population, 97% for the RC 
population, and 88% for the DL population. Population 
genetic parameters were estimated for the JN popula- 
tion (Supplementary Table S1), because this popula- 
tion had a relatively large sample size compared with 
the other populations, and therefore, genetic para- 
meters could be estimated more reliably. For all 
markers, MAF ranged from 0.02 to 0.50. The H 0 and 
H e were 0.34 and 0.3 7, respectively; these values 
are relatively low compared with the previous estimates 
forthesame population using microsatellite markers. 21 
In total, 1422 markers passed the neutrality test, 
suggesting that these markers are not under strong 
selection and are suitable for population genomic ana- 
lysesthat usually require neutral markers. The majority 
of markers were in Hardy-Weinberg equilibrium, and 
only 81 markers showed significant departures after 
Bonferroni correction. Significant linkage disequilibrium 
was detected in 1 02 marker pairs. These markers are 
valuable for detecting recent admixture and migration 
events in natural populations. 26 

3.4. Cross-species transferability test 

To assess the cross-species transferability of the devel- 
oped SNP markers, 34 markers targeting 24 synonym- 
ous and 1 0 non-synonymous SNPs were selected for 
HRM genotyping in five scallop species deriving from 
t h ree Pect i n id ae s u bf a m i I ies. A tota I of 2 3 we re success- 
fully amplified in at least one species, of which 1 6 were 
polymorphic in at least one species. The highest trans- 
ferability (up to four species) was observed for marker 
C1 0745S1 1 5_CG (Table 2). Unexpectedly, the total 



Table 2. The cross-species transferability of 1 6 C.farreri SNP markers 



Marker name Species name 

Chlamys nobilis Patinopecten yessoensis Amusium pleuronectes Argopecten irradians Argopecten purpuratus 



C1 0745S1 1 5_CG 
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V 


C1 2208S460_TC 
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V 
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V 
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Non-synonymous SNP makers are indicated in grey. 
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Figure 3. Schematic representation of the correspondence between eight C. farreri SNP markers and Pacific oyster scaffold41 9 (genome 
assembly v9). Oyster proteins and the related C. farreri SNPs are indicated below the scaffold. 



number of transferable markers was quite similar (6-7 
markers) for the three Pectinidae subfamilies, which 
does not reflect the difference in their phylogenetic 
relationships to C. farreri (i.e. C. farreri belongs to 
Chlamydinae, which is more closely related to 
Pectininae than to Aequipectini 27 )- The markers that 
could be transferred to at least three species were all 
synonymous SNPs; while all but one of the non- 
synonymous SNPs were transferable only to a single 
species. 



3.5. Functional annotation and gene comparison with 
Pacific oyster 

Gene annotation was performed for the 1492 
markers using BlastX comparison against the Swiss-Prot 
and Nr protein databases. A total of 1095 (73.4%) 
markers had significant matches to known proteins in 
these databases, corresponding to 953 unique acces- 
sions (Supplementary Table S1). GO analysis revealed 
that one or more GO terms could be assigned to 466 
markers for a total of 923 GO assignments. The anno- 
tated markers were involved in diverse biological pro- 
cesses and functions, and their GO composition largely 
resembled that summarized for the total set of contigs 
(Supplementary Fig. S1). KEGG pathway analysis 
revealed that 420 markers were involved in 1 23 differ- 
ent pathways (Supplementary Table S2). In particular, 
21 markers were involved in the immune system, and 
these markers are worthy of furtherevaluation to deter- 
mine whetherany are associated with disease/pathogen 
resistance. 

To evaluate the utility of the 1 492 SNP markers for 
future comparative analysis with the oyster genome, 
the contig sequences of these markers were compared 
against the oyster protein database. A total of 963 C. 
farreri contigs containing 1 034 SNP markers showed 
significant sequence homology to oyster proteins (e- 
value of <1e— 4; Supplementary Table S1). One 
example showing the correspondence between eight 
SNP markers and oyster scaffold419 is presented in 
Fig. 3. These oyster proteins are distributed on 514 
genomic scaffolds (genome assembly v9; Table 3). 
The lengths of these scaffolds ranged from 2 to 1 965 



Table 3. Summary of the oyster scaffolds matched by C. farreri SNP 
markers 



Total marker no. 
matched to each 
scaffold 


No. of matched 
oyster scaffolds 


Length of matched 
scaffolds (kb) 


1 2 


1 


1620 


1 0 


1 


1 193 


9 


1 


1 726 


8 


7 


649-1 965 


7 


4 


969-1 1 20 


6 


10 


650-1 697 


5 


1 5 


538-1 855 


4 


25 


63-1 372 


3 


63 


58-1 861 


2 


108 


31-1 727 


1 


279 


2-1 1 00 



kb. A total of 235 scaffolds were matched by at least 
two C. farreri markers, and 39 scaffolds were matched 
by at least five markers (Table 3). These markers and 
matching related oyster scaffolds provide an important 
basis for further genome comparison between the two 
species. 

3.6. Large-scale marker associations with growth traits 
Based on the developed marker set, we further con- 
ducted a large-scale association analysis of growth 
traits using two C. farreri populations that were 
derived from a large breeding population of an elite C. 
farreri variety ('Penglai-Red' scallop), but were reared 
in two different geographic locations. Two populations 
were used in the association analysis to ensure identifi- 
cation of SNP loci linked with the genetic variance of 
traits but not the environmental effects. To reduce the 
genotyping cost, a selective genotyping approach was 
adopted, in which only two groups of individuals 
sampled from the two tails of the trait distribution 
were used for HRM analysis (Fig. 4). An initial scan of 
the whole marker set on DNA pools revealed that five 
markers showed prominent allele frequency difference 
between the two groups in both populations (Fig. 5). 
Statistical significance of the observed allele frequency 
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Figure 4. The shell-height distribution of approximately 1 000 C.farreri individuals collected from each of two C.farreri populations. For the 
large body size (LBS) groups, the average trait value was 75.59 ± 2.92 mm for population a and 72.02 ± 2.57 mm for population b, 
whereas for small body size (SBS) groups, it was 49.93 + 3.84 and 53.04 + 3.41 mm for populations a and b, respectively. For each 
population, 40 individuals were sampled from each group for association analysis. 



difference was further confirmed for the five markers by 
genotyping each individual within each group for both 
populations (Table 4). After Bonferroni correction, four 
markers remained significant in at leastone population. 
Gene annotation information was available foronly one 
marker, C7493S233_CT (SDF4, 45 kDa calcium- 
binding protein, 7e— 1 9). 

4. Discussion 

4.1 . Large-scale marker development from NGS- 
generated transcriptomic resources 
NGS technologies are now frequently being used to 
generate extensive genomic resources for non-model 
organisms. Although a large number of SNPs can be 
readily discovered from these resources, large-scale 
marker development and evaluation are seldom per- 
formed for such in silico predicted SNPs. In this study, 
we conducted the first transcriptome-wide marker de- 
velopment from more than 21 ,000 putative SNPs for a 
bivalve mollusc, which currently represents the largest 
gene-associated marker collection for the phylum 
Mollusca. Although stringent criteria were adopted for 
transcriptome assembly, the marker conversion rate 



remained relatively low (~33%), but was still com- 
parable with those reported in recent studies using 
the same genotyping platform. 14-16 The inefficient 
marker conversion can largely be attributed to low 
passing rates in the primer and probe tests, i.e. 32.4% 
failed the primer test and 1 8% failed the probe test. 
This issue may be directly related to a few well-known 
drawbacks of NGS platforms. For example, NGS plat- 
forms usually generate a substantial higher rate of 
sequencing errors than traditional Sanger-based 
methods. In particular, the 454 sequencing platform 
is prone to introducing higher indel errors than other 
platforms, especially when stretches of homopolymeric 
bases are present. 28-30 Sequencing errors, when occur- 
ring at positions where a primer or probe binds, may 
hamperthe amplification efficiency and confound sub- 
sequent HRM analysis. In addition, sequencing errors 
are also a major source of falsely predicted SNPs. As 
revealed by this study, there is a trend for SNPs with 
higher MAF to have higher validation rates. This result 
suggests that SNPs with high MAF (e.g. 40-50%) are 
less likely to be affected by sequencing errors and 
thus,should be given high priorityduring markerdevel- 
opment. Furthermore, NGS platforms usually produce 
very short reads (e.g. 3 5-1 00 bp); accurate de novo 
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Figure 5. Example HRM profiles for two growth-related markers, C1 0973S307_GAandC7493S233_CT, identified by genotyping DNA pools 
prepared from both the LBS and SBS groups. Clear differences in allele frequencies were observed between groups. 



assembly of such short reads poses a significant inform- 
atics challenge. 31 Assembly artefacts can arise when 
reads of different genomicorigins are misassembled to- 
gether due to high similarity of sequence context. Such 
artefacts may result in PCRfailurewhen primersare tar- 
geting two genomic regions far from each other. One 
interesting finding is that introns were present at a sub- 
stantially high frequency, even though we restricted the 
expected amplicon length to ~1 00 bp during primer 
design. Approximately 14% of primer pairs produced 
amplicons with a larger than expected size, indicating 
that introns were present in the vicinity of SNPs. It 
should be noted thatthis percentage most likely under- 
estimates the actual occurrence of introns, because very 
large introns would more likely result in PCR failure 
rather than successful amplicons of larger than 
expected size. The aforementioned issues cannot be 
easily amended without the aid of a high-quality refer- 
ence genome. However, with the rapid development of 
sequencing technologies (i.e. much longer reads and 
higher accuracy) and more draft genomes being 



generated for non-model species, there is much hope 
that these issues can be resolved. 



4.2. Genie versus non-genie markers in molluscan 
population studies 
Currently, the genetic markers available for molluscs 
were developed predominantly from anonymous 
genomic sequences. It is well known that molluscan 
genomes are typically highly heterozygous. 4,5 For 
example, the Pacific oyster has an average SNP density 
of one SNP per 60 bp in coding regions and one per 
40 bp in non-coding regions. 32 Such high genome het- 
erozygosity can result in a high frequency of null alleles, 
i.e. alleles fail to amplify due to random mutations in 
primer-binding regions. Extremely high proportions of 
null alleles have been observed for the microsatellite 
markers developed in C. farreri (56%). 33 Application 
of these markers in population genetic studies can se- 
verely distort the estimation of population structure 
and differentiation, parentage analysis, and assignment 
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tests, 34 and may result in misleading conclusions or 
inferences. In contrast, genetic markers developed 
from transcriptomic sequences can, to some extent, al- 
leviate this problem, because transcribed regions are 
usually more conserved than non-transcribed regions. 
In this study, the average PCR success rate across all 
assayed individuals was 93%, suggesting that null 
alleles were indeed present at very low frequencies, as 
expected for gene-derived markers. 

Remarkably, high population transferability was ob- 
served for the developed SNP markers, and the vast ma- 
jority of which were also neutral markers. Our study, 
therefore, provides a valuable set of SNP markers that 
are suitable for population genomic studies. A major 
benefit of utilizing gene-associated markers is that it is 
possible to quickly identifying causal genes under 
natural selection, which usually involves identifying 
outlier SNP markers showing significantly increased or 
decreased differentiation among populations com- 
pared with neutral expectations. 35 Scaling up the 
numberof available gene-associated SNPs tothousands 
of markers extends the genome coverage, thereby 
increasing the probability of identifying outlier loci 
that are in tight linkage disequilibrium with loci under 
selection. 



4.3. Cross-species transferability and comparative 
genomics 

Gene-associated markers generally exhibit higher 
cross-species transferability in closely related species 
than markers developed from anonymous genomic 
sequences. However, to what extentSNP markers devel- 
oped from one species are transferable to others within 
the Pectinidae family remains unknown. In this study, 
about half of the tested markers were polymorphic 
in at least one of the five scallop species, with the 
most polymorphic marker transferable to up to four 
species, indicatingthe high cross-species transferability 
of the developed SNP markers. However, marker 
transferability across the three Pectinidae subfamilies 
showed no correlation with their phylogenetic relation- 
ships to C. farreri. This is somewhat unexpected, sug- 
gesting that these transferable SNPs may largely 
represent ancestral genetic variations that have been 
preserved differentially among Pectinidae subfamilies. 
Our study also revealed that SNPs with high transfera- 
bility (>3 species) were all synonymous SNPs. This 
makes sense because synonymous SNPs do not alter 
the encoded amino acids and thus, are less likely to be 
removed by purifying selection. Once a polymorphism 
had arisen, it would be preserved for a long period of 
time during the evolution of the Pectinidae family. 

The high cross-species transferability together with 
the high annotation rate (~73%) of the developed 
markers hold great promise for conducting comparative 
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genome analysis among molluscs. The recently released 
Pacific oyster genome represents the best assembled 
genome currently available for bivalve molluscs. 4 It pro- 
vides a valuable genomic resource not only for oyster 
genetic and breeding studies, but also for comparative 
genome analysis among molluscs. One of our ongoing 
projects is to construct a high-density linkage map for 
C. farreri based on the markers developed here. In the 
present study, we made an initial assessment of the use- 
fulness of the developed markers (if they were included 
in a future linkage map) for comparative genome ana- 
lysis between the two bivalve species. Of the annotated 
markers, 69% could be anchored onto the Pacific 
oyster genome, and 2 50 markers could be linked to 
very large oyster scaffolds that are longer than 500 kb 
and matched by at least five markers. Therefore, once 
the high-density map is constructed, these markers 
would help identify conserved genomic regions (i.e. 
synteny blocks) between the two species. 

4.4. Large-scale marker association analysis 
of growth traits 

Identification of major genes responsible for growth 
traits is highly desirable in scallop breeding pro- 
grammes for the purpose of genetic improvement. 
Gene-associated markers are valuable tools for fulfilling 
such task, because they have great potential for quickly 
identifying causal genes underlying the trait of interest. 
The extensive marker set generated by this study pro- 
vides an unprecedented opportunity to conduct a large- 
scale association analysis of growth traits in C. farreri. 
Using a selective genotyping approach, five growth- 
related markers were identified and confirmed in two C. 
farreri populations, where allele frequencies were asso- 
ciated with variation in growth trait. Unfortunately, four 
of these markers are currently uncharacterized, and 
therefore, their putative roles in growth regulation 
cannot be inferred. The marker C7493S233_CT was 
the only marker with annotation information (SDF4, 
45 kDa calcium-binding protein). This gene belongs 
to the calcium-binding protein family, a large group of 
proteinsthatcan regulate a variety of cellular processes in- 
cluding cell division, differentiation, motility, and ap- 
optosis. 36 Mutations in calcium-binding proteins have 
been associated with variation in growth traits, as de- 
monstrated in Chinese cattle. 37 Currently, it is largely 
unknown which genes or genetic loci are involved in 
scallop growth regulation. Our study offers the first 
report of candidate genes responsible for scallop growth 
regulation and provides markers for further genetic 
improvement of C. farreri in breeding programmes. 

4.5. Conclusions 

We developed for the first time large-scale gene- 
associated SNP markers for a bivalve mollusc, which 



currently represents the largest gene-associated marker 
collection in the phylum Mollusca. The properties of 
high polymorphism within and across populations, 
low frequency of null alleles, neutrality, and high 
cross-species transferability make these markers 
highly valuable for population genomic, comparative 
genomic, and genome-wide association studies. 
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